Author Verification Using Syntactic N-grams: Notebook for PAN at CLEF 2015
نویسندگان
چکیده
This paper describes our approach to tackle the Author Verification task at PAN 2015. Our method builds a representation of an author’s style by using the information contained in dependency trees. This information is represented as syntactic n-grams and used to conform a vector space. Using unsupervised machine learning approach, each instance is associated to the correponding author using the Jaccard distance. In this paper, we describe the features that were used and the employed unsupervised machine learning algorithm.
منابع مشابه
Syntactic N-grams as Features for the Author Profiling Task: Notebook for PAN at CLEF 2015
This paper describes our approach to tackle the Author Profiling task at PAN 2015. Our method relies on syntactic features, such as syntactic based n-grams of various types in order to predict the age, gender and personality traits that has the author of a given text. In this paper, we describe the used features, the employed classification algorithm, and other general ideas concerning the expe...
متن کاملEnsembles of Proximity-Based One-Class Classifiers for Author Verification Notebook for PAN at CLEF 2014
We use ensembles of proximity based one-class classifiers for authorship verification task. The one-class classifiers compare, for each document of the known authorship, the dissimilarity between this document and the most dissimilar other document of this authorship to the dissimilarity between this document and the questioned document. As the dissimilarity measure between documents we use Com...
متن کاملA Basic Character N-gram Approach to Authorship Verification Notebook for PAN at CLEF 2013
This paper describes our approach to the Author Identification task in the PAN 2013 evaluation lab. We use a profile-based approach and use the common n-grams (CNG) method that employs a normalized distance measure for short and unbalanced text introduced by Stamatatos[6]. We achieved the 9th place with an overall F1 score of 0.6.
متن کاملVector Space Model and Overlap Metric for Author Identification Notebook for PAN at CLEF 2013
This paper describes our entry for the Author Identification task at PAN 2013. The Author Identification task was performed using a combination of Vector Space Model [1] (VSM) and Similarity Overlap Metric [3] (SOM) on the character n-grams extracted from the documents related to an author and the document of question. A combination of the VSM and SOM provided an overall F-measure, precision an...
متن کاملProximity Based One-class Classification with Common N-Gram Dissimilarity for Authorship Verification Task Notebook for PAN at CLEF 2013
We describe our participation in the Author Identification task of the PAN 2013 competition. This competition task presents participants with a set of authorship verification problems. In each such a problem, one is given a set of documents written by one author and a sample document; the task is to answer the question whether or not the sample document was written by the same author as the rem...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015